|
Pseudo amino acid composition, or PseAA composition, or Chou's PseAAC, was originally introduced by Kuo-Chen Chou in 2001 to represent protein samples for improving protein subcellular localization prediction and membrane protein type prediction. == Background == To predict the subcellular localization of proteins and other attributes based on their sequence, two kinds of models are generally used to represent protein samples: (1) the sequential model, and (2) the non-sequential model or discrete model. The most typical sequential representation for a protein sample is its entire amino acid (AA) sequence, which can contain its most complete information. This is an obvious advantage of the sequential model. To get the desired results, the sequence-similarity-search-based tools are usually utilized to conduct the prediction. However, this kind of approach fails when a query protein does not have significant homology to the known protein(s). Thus, various discrete models were proposed which do not rely on sequence-order. The simplest discrete model is using the amino acid composition (AAC) to represent protein samples, formulated as follows. Given a protein sequence P with amino acid residues, i.e., * where R1 represents the 1st residue of the protein P, R2 the 2nd residue, and so forth, according to the amino acid composition (AAC) model, the protein P of Eq.1 can be expressed by * where are the normalized occurrence frequencies of the 20 native amino acids in P, and T the transposing operator. Accordingly, the amino acid composition of a protein can be easily derived once the protein sequencing information is known. Owing to its simplicity, the amino acid composition (AAC) model was widely used in many earlier statistical methods for predicting protein attributes. However, all the sequence-order information is lost. This is its main shortcoming. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Pseudo amino acid composition」の詳細全文を読む スポンサード リンク
|